DOMAIN WORD TRANSLATION BY SPACE-FREQUENCY ANALYSIS OF CONTEXT LENGTH HISTOGRAMS - Acoustics, Speech, and Signal Processing, 1996. ICASSP-96. Conference Proceedings., 1996 IEEE Inte
نویسنده
چکیده
We report a new statistical feature relating a bilingual word pair in a non-parallel English-Chinese corpus. It is found that the lengths of context segments of a word is closely correlated to that of its translation, even when the corpus is non-parallel, i.e., monolingual texts which are not translations of each other. The context segment length histogram of a word has a characteristic pattern and corresponds to that of its translation. If a word appears most frequently in long segments, its translation is found to be most likely occurring in long segments. One way to match these histograms is to first extract their salient shape characteristics by space-frequency analysis and then match them against each other using dynamic time warping. The results of matching can be used in combination with other statistical features to bootstrap a word or term translation algorithm from non-parallel corpora. some previous approaches is their orientation toward European language pairs. They cannot be applied to language pairs such as Chinese and English. A new approach is which would be extendable to other language pairs is needed. This paper demonstrates a pattern matching method by using a statistical feature, the context length histogram, to correlate pairs of translated words. It will also be shown how space-frequency analysis is used for matching such word pair signals for translation. As input corpus, the bilingual transcription of the Hong Kong Legislative Council debates is used for experiments [6]. The data is from 1988-1992, with the first 73618 sentences from the English text, and the next 73618 sentences from the Chinese text. There are no overlapping sentences between the texts. The topics of these debates focus on the political and social issues of Hong Kong. 2. ALGORITHM OVERVIEW
منابع مشابه
OPTIMAL PHASE KERNELS FOR TIME-FREQUENCY ANALYSIS - Acoustics, Speech, and Signal Processing, 1996. ICASSP-96. Conference Proceedings., 1996 IEEE Inte
We consider the design of kernels for time-frequency distributions through the phase, rather than amplitude, response. While phase kernels do not attenuate troublesome crosscomponents, they can translate them in the time-frequency plane. In contrast to previous work on phase kernels that concentrated on placing the cross-components on top of the auto-components, we set up a “don’t care” region ...
متن کاملESTIMATION OF RIGID BODY MOTION AND SCENE STRUCTURE FROM IMAGE SEQUENCES USING A NOVEL EPIPOLAR TRAN - Acoustics, Speech, and Signal Processing, 1996. ICASSP-96. Conference Proceedings., 1996 IEEE Inte
We present a new algorithm for estimation of rigid body motion parameters and scene structure from monocular image sequences. A novel Epipolar Transform is utilized to preserve the relevant information from mean squared displaced frame difference (DFD) surfaces and thus overcomes the inherent limitations of feature correspondence methods. Our algorithm conducts a coarse-to-fine search in 5-dime...
متن کاملKnowledge-Based Transformation Ordering - Acoustics, Speech, and Signal Processing, 1996. ICASSP-96. Conference Proceedings., 1996 IEEE Inte
We propose a two-step approach for transformation ordering which combines the use of optimization-intensive CAD techniques with knowledge-based user-driven search strategy. The first step is development of basic building blocks which target small sets of transformations which are well suited for optimization intensive CAD treatment. Next, transformation orderings are developed using knowledge a...
متن کاملDesign of Efficient FIR Filters with Cyclotomic Polynomial Prefilters Using Mixed Integer Linear Pro - Acoustics, Speech, and Signal Processing, 1996. ICASSP-96. Conference Proceedings., 1996 IEEE Inte
s-The Cyclotomic Polynomial (CP) prefilter design problem is formulated as an optimization problem with linear objective functions by applying the logarithm to the transfer function of the CP prefilter. Then this problem is solved by mixed integer linear programming (MILP). Design examples demonstrate that this method leads to more efficient cascaded FIR prefilter-equalizers than existing methods.
متن کاملAlgorithms for Blind Equalization with Multiple Antennas Based on Frequency Domain Subspaces - Acoustics, Speech, and Signal Processing, 1996. ICASSP-96. Conference Proceedings., 1996 IEEE Inte
This paper considers the problem of recovering an unknown signal transmitted over an unknown (but stationary) multipath channel, and received by a narrowband array with unknown calibration. U.nlike recently proposed multichannel blind equalization techniques, the methods described herein employ a model based on physical channel parameters rather than unstructured multiple output F IR filters. T...
متن کامل